Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes from Cole Trapnell #110

Merged
merged 14 commits into from
Nov 28, 2023

Conversation

brgew
Copy link
Contributor

@brgew brgew commented Oct 26, 2023

There are essentially three groups of changes (in order of most important to least important for our lab):

  1. The computation and storage of the variance-covariance matrix on model parameters (theta) when using the bootstrap.
  2. An expanded interface for configuring the model initialization, to allow for customization of the "post-treatment" steps.
  3. Changes to the torch optimizer to smooth the way for running on GPUs

Copy link
Member

@jchiquet jchiquet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some very minor comments which need quick explanation before approval.

@@ -105,7 +105,7 @@ PLN_param <- function(
Omega = NULL,
config_post = list(),
config_optim = list(),
inception = NULL # pretrained PLNfit used as initialization
inception = NULL # pretrained PLNfit used as initialization,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the additional comma if its the last element of the list?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops :)

# CHECK_ME_TORCH_GPU
# This appears to be in torch_gpu only. The commented out line below is
# in both PLNmodels/master and PLNmodels/dev.
myPLN <- switch(control$covariance,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why one would need to init with a diagonal or spherical PLN covariance model, when the goal is to fit a PLNnetwork model, that is, one with sparse precision/covariance matrix... But if used for completeness/compatibility, that is fine.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have exactly the same question. Diagonal covariance matrix would result in an empty graph (with only isolated nodes).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maddyduran and I seem to remember issues when initializing PLNnetwork when you have fewer samples than species - if you use "fixed" there's a call to solve() that can throw an exception. We figured that "spherical" or diagonal would avoid this (skipping that call) and move on to estimating a penalized inverse covariance matrix even with a spherical or diagonal init.

Copy link
Collaborator

@mahendra-mariadassou mahendra-mariadassou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for the nice PR and many changes, including the brand new jackniffe estimator for the B matrix.

Like @jchiquet I have a few comment before approving the PR. The main one concerns changes to PLNnetwork which I'm not sure are so useful: since a PLNnetworkfit is always an object of class PLNfit_fixedcov, there is limited use in passing a different variance structure in the control list (and it increases potential for making mistakes).

@@ -107,6 +108,11 @@ trace <- function(x) sum(diag(x))
x
}

.logfactorial_torch <- function(n){
n[n == 0] <- 1 ## 0! = 1!
n*torch_log(n) - n + torch_log(8*torch_pow(n,3) + 4*torch_pow(n,2) + n + 1/30)/6 + log(pi)/2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For .logfactorial_torch(), shouldn't the final term log(pi)/2 be torch_log(pi)/2 ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, though not actually sure what would happen under the hood here in terms of evaluation?

.5 * torch_sum(torch_mm(params$M, params$Omega) * params$M + S2 * torch_diag(params$Omega), dim = 2)
Ji_tmp = Ji_tmp$cpu()
Ji_tmp = as.numeric(Ji_tmp)
Ji <- .5 * self$p - rowSums(.logfactorial(as.matrix(data$Y$cpu()))) + Ji_tmp
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use .logfactorial_torch() instead of .logfactorial() ? Because the computation has already moved back to the CPU at this point ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. Perhaps it would be better to defer that so we can use .logfactorial_torch() instead

@@ -156,7 +159,7 @@ PLNfit <- R6Class(

## Check for convergence
if (delta_f < config$ftol_rel) status <- 3
if (delta_x < config$xtol_rel) status <- 4
#if (delta_x < config$xtol_rel) status <- 4
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to remove the convergence check on the parameter values and to keep only the one on the ELBO value ? This will speed up the algorithm (less conditions to satisfy) but may cause the nlopt and torch implementations diverge in the result they produce (not a bad thing per se, but something we need be aware of).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I defer to you on that - we have been generally happy with larger default values of ftol_rel, triggering convergence on status 3 most of the time.

@@ -217,6 +220,54 @@ PLNfit <- R6Class(
invisible(list(var_B = var_B, var_Omega = var_Omega))
},

compute_vcov_from_resamples = function(resamples){
# compute the covariance of the parameters
get_cov_mat = function(data, cell_group) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be mistaken, but get_cov_mat() appears to be defined but never used anywhere in compute_vcov_from_resamples(). Is it necessary ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vestigial code from debugging - can be removed

# CHECK_ME_TORCH_GPU
# This appears to be in torch_gpu only. The commented out line below is
# in both PLNmodels/master and PLNmodels/dev.
myPLN <- switch(control$covariance,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have exactly the same question. Diagonal covariance matrix would result in an empty graph (with only isolated nodes).

nullModel <- nullModelPoisson(self$responses, self$covariates, self$offsets, self$weights)
#' @param config_post a list for controlling the post-treatment.
postTreatment = function(config_post, config_optim) {
#nullModel <- nullModelPoisson(self$responses, self$covariates, self$offsets, self$weights)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No comparison to a null Poisson model in the general post-treatment of PLN families. Is it to improve speed / because it's not required for the post-treatments ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In our applications, the call to glm.fit inside of nullModelPoisson() can sometimes throw an exception, which ruins a perfectly good PLN fit! Since it seemed to us that nullModelPoisson() was something one only needed to do when (optionally) computing the approximate R2, we thought this call was superfluous.

variance_jackknife = function(Y, X, O, w, config = config_default_nlopt) {
jacks <- future.apply::future_lapply(seq_len(self$n), function(i) {
jacks <- lapply(seq_len(self$n), function(i) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no strong opinion on this one as @jchiquet was the one who wrote this part but is there a reason prefer lapply to future_lapply (one less dependency ? simpler to use ?). A nice thing about future_lapply is that it is backend-agnostic and can be used for several parallalelization paradigms.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, agree it would be wonderful to keep this. The issue is that on machines that use OpenBLAS with a multithreaded backend, using future can deadlock the session. A workaround is to wrap calls to future with something like this:

old_omp_num_threads = as.numeric(Sys.getenv("OMP_NUM_THREADS"))
if (is.na(old_omp_num_threads)){
   old_omp_num_threads = 1
}
RhpcBLASctl::omp_set_num_threads(1)

old_blas_num_threads = as.numeric(Sys.getenv("OPENBLAS_NUM_THREADS"))
if (is.na(old_omp_num_threads)){
  old_blas_num_threads = 1
}
RhpcBLASctl::blas_set_num_threads(1)

Then you do work with future and then:

RhpcBLASctl::omp_set_num_threads(old_omp_num_threads)
RhpcBLASctl::blas_set_num_threads(old_blas_num_threads)

We didn't add this because we didn't want to add a new dependency on RhpcBLASctl to the package, but you could do if you want to be able to do linear algebra inside of functions called by future

data <- list(Y = Y[resample, , drop = FALSE],
X = X[resample, , drop = FALSE],
O = O[resample, , drop = FALSE],
w = w[resample])
#print (config$torch_device)
#print (config)
if (config$algorithm %in% c("RPROP", "RMSPROP", "ADAM", "ADAGRAD")) # hack, to know if we're doing torch or not
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a backend element (set to the appropriate value) to config_default_nlopt and config_default_torch so that they are self-aware and that we can use
if (config$backend == "torch") {...}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be better. Would also be good to be able to specify the torch device (e.g. "mps", "cuda", etc)

args <- list(data = data,
params = list(B = private$B, M = matrix(0,self$n,self$p), S = private$S[resample, ]),
config = config)
if (config$algorithm %in% c("RPROP", "RMSPROP", "ADAM", "ADAGRAD")) # hack, to know if we're doing torch or not
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as previous comment

@@ -72,18 +73,25 @@ PLNnetwork <- function(formula, data, subset, weights, penalties = NULL, control
#' @seealso [PLN_param()]
#' @export
PLNnetwork_param <- function(
backend = "nlopt",
backend = c("nlopt", "torch"),
covariance = c("fixed", "spherical", "diagonal"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm mixed about this: to avoid potential problems, it would be better to allow only covariance = "fixed" (especially since f801ca6) reverts back to PLNnetworkfit <-R6Class(inherits = PLNfit_fixedcov)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maddyduran and I discussed this and we think the use case for covariance = "spherical" or "diagonal" in PLNnetwork is when you have fewer samples than you have species. IIRC, the issue was the call to solve() in "fixed" optimize() function.

@ctrapnell
Copy link
Contributor

@mahendra-mariadassou thanks for the question. I replied to this question in the wrong place yesterday - but I think the answer is that the use case for covariance = "spherical" or "diagonal" in PLNnetwork is when you have fewer samples than you have species. IIRC, the issue was the call to solve() in "fixed" optimize() function. If n < p, that solve can throw an exception, never making it to the sparse estimation (which can deal with the n < p case at least insofar as returning an answer).

@mahendra-mariadassou
Copy link
Collaborator

mahendra-mariadassou commented Nov 13, 2023

Hi,

We made another pass with @jchiquet and made a few suggestions on a new tweaks branch. Since I'm not a github expert, I branched tweaks from PLN-team:master, pulled cole-trapnell-lab:master into and then pushed a few commits (so that it has our suggestions on top of your PR).

Here are the suggestions:

  • [44a1cbd] Slight changes to the PLNnetwork inception to make it obvious that the covariance structure is only enforced in the inception model and not in the fitted model
  • [48b5925] use config_post for all PLN*() functions
  • [6e93d99] and [772b3ad] Restore the convergence check on parameter values updates (as it can be deactivated by setting xtol_rel to infty) and the use of future_lapply (@jchiquet will open a specific issue for the problem you mentioned when using openBLAS but we'd like to fix it everywhere future_lapply is used).
  • [8ccf9af] Add a backend parameters to config_optim so that one can check the function is running with nlopt or torch

You can pull PLN-team:tweaks into your master and make additional changes if you want, this should automatically update this PR.

@jchiquet
Copy link
Member

jchiquet commented Nov 14, 2023

To complete @mahendra-mariadassou 's answer, regarding lapply vs future_lapply, I suggest, as mahendra said, that we open a specific issue since it concerns all paralelissable action. Like Cole, I had noted that using future with a multi-threaded BLAS/LAPACK library (such as OpenBLAS) could have detrimental consequences. I therefore suggest defining a lapply function which, depending on the architecture in place, directs towards a classic or multicore lapply. See if future is capable of this ('sequential' or 'multicore' plan).

This is now referenced as Issue #111

@ctrapnell
Copy link
Contributor

Hi PLN-team. Thanks for creating this. We have pulled this branch into our fork, merged in the changes from the main fork's master, and run our tests. All looks great to us!

@mahendra-mariadassou mahendra-mariadassou merged commit 227c6da into PLN-team:master Nov 28, 2023
@mahendra-mariadassou
Copy link
Collaborator

Hi @ctrapnell. You probably received an automatic email from github, but since I didn't use a standard workflow and just in case. I merged all your changes (amended with our changes in PLNteam:tweaks) into master, which automatically closed the PR.

Thanks again for the PR, the various changes and more generally your interest in PLNmodels !

@ctrapnell
Copy link
Contributor

ctrapnell commented Nov 28, 2023 via email

@jchiquet
Copy link
Member

Thank you all very much. I'm preparing a version for CRAN today, now that Mahendra has done most of the work.

@jchiquet
Copy link
Member

@ctrapnell, @maddyduran and @brgew we'd like to add you to the list of package contributors, do you agree? If so, I'll need an e-mail address for each of you (I get [email protected], [email protected] and [email protected], is this correct?)

@brgew
Copy link
Contributor Author

brgew commented Nov 29, 2023 via email

@jchiquet
Copy link
Member

jchiquet commented Dec 2, 2023

Thank you @brgew for your answer! I assume then that @maddyduran and @ctrapnell, who actually contributed to the PR, would like to be added as contributors.
Best

@ctrapnell
Copy link
Contributor

ctrapnell commented Dec 4, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants